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Abstract: Wc consider the finite sample properties of the regularized high- 
dimensional Cox regression via lasso. Existing literature focuses on linear 
models or generalized linear models with Lipschitz loss functions, where the 
empirical risk functions are the summations of independent and identically 
distributed (iid) losses. The summands in the negative log partial likelihood 
function for censored survival data, however, are neither iid nor Lipschitz. 
We first approximate the negative log partial likelihood function by a sum of 
iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities 
for the lasso penalized Cox regression using pointwise arguments to tackle 
the difficulty caused by the lack of iid and Lipschitz property. 
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1. Introduction 

Since it was introduced by Tibshirani (1996), the lasso regularized method 
for high-dimensional regression models with sparse coefficients has received a 
great deal of attention in the literature. Properties of interest for such regres- 
sion models include the finite sample oracle inequalities. Among the exten- 
sive literature of the lasso method, Bunea, Tsybakov, and Wegkamp (2007) and 
Bickel, Ritov, and Tsybakov (2009) derived the oracle inequalities for prediction 
risk and estimation error in a general nonparametric regression model including 
the high-dimensional linear regression as a special example, and van de Geer 
(2008) provided oracle inequalities for the generalized linear models with Lips- 
chitz loss functions, e.g. logistic regression and classification with hinge loss. 

We consider lasso regularized high-dimensional Cox regression. Let T be the 
survival time and C the censoring time. Suppose we observe a sequence of iid 
observations (Y,;, A^, X,), i = 1, . . . ,n, where Yi = Ti A C,;, Aj = I{Ti<Ci}i 
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Xi are the covariates in . Due to largely parallel material, we follow closely 
the notation in van do Geer (2008). Let 



k k=l J 

Here is a convex subset of R™, and the functions ipi, - ■ ■ , ipm are real- valued 
basis functions on ^ , which are identity functions of corresponding covariates 
in a standard Cox model. 

Consider the following Cox model (Cox, 1972): 

A(<|X) = Ao(i)e/»W, 

where 9 is the parameter of interest and Xq is the unknown baseline hazard 
function. The negative log partial likelihood function for 9 becomes 



1 " 

lni9)^—J2{feiX,)-log 



fe{X, 



A,; 



(1.1) 



The corresponding estimator with lasso penalty is denoted by 



9„ := argmin{;„(6l) + A„/(6l)}, 



where 1(9) := J2T=i '^k\9k\ is the weighted h norm of the vector 9 E R™, with 

random weights ak := [l/"-Z]"=i i^li^i)] ■ 

Clearly the negative log partial likelihood is a sum of non-iid random vari- 
ables. For ease of theoretical calculation, it is natural to consider the following 
intermediate function as a "replacement" of the negative log partial likelihood 
function: 



U0) = — E ^MXd - logfiiY,; fg)} A,, 

n ^ — ^ 



(1.2) 



which has the desirable iid structure, but with an unknown population expec- 
tation 

The negative log partial likelihood function (1.1) can then be viewed as a "work- 
ing" model for the empirical loss function (1.2), and the corresponding loss 
function becomes 

lf,^l{fe{X),Y,A) := -{fg{X) - log fi{Y;fg)} A, (1.3) 

with expected loss 

m = -EY,A.x[{f0{X) - log ^l{Y■Jg)} A] - P7/,, (1.4) 
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where P denotes the distribution of (F, A, X). Define the target function / by 

/:= argminF7/, 

where F 13 ^ . For simphcity we will assume that there is a unique minimum 
as in van de Geer (2008). Uniqueness holds for the regular Cox model when 
F = see for example, Andersen and Gill (1982). Define the excess risk of / 

by 

It is desirable to show similar non- asymptotic oracle inequalities for the Cox 
regression model as in, for example, van de Geer (2008) for generalized linear 
models. That is, with large probability, 

^ih ) < const. X min {£{fe) + Ve} ■ 

Here Ve is called the "estimation error" by van dc Geer (2008), which is typically 
proportional to A^j times the number of nonzero elements in 0. 

Note that the summands in the negative log partial likelihood function (1.1) 
are not iid, and the intermediate loss function 7(-, Y, A) given in (1.3) is not Lip- 
schitz. Hence the conclusion of van de Geer (2008) can not be applied directly. 
With the Lipschitz condition in van de Geer (2008) replaced by a similar bound- 
edness assumption for regression parameters in Biihlmann (2006), we tackle the 
problem using pointwise arguments to obtain the oracle bounds of two types of 
errors: one is between empirical loss (1.2) and expected loss (1.4), and one is 
between the negative log partial likelihood (1.1) and empirical loss (1.2). 

The article is organized as follows. In Section 2, we provide assumptions 
and additional notation that will be used throughout the paper. In Section 3, 
following the flow of van de Geer (2008), we first consider the case where the 

1/2 

weights (Tfc := [_B-0^(X)] are fixed, then discuss briefly the case with random 
weights (Tfc. 



2. Assumptions 

We impose five basic assumptions in this section. Assumptions A, B, and C are 
identical to the corresponding assumptions in van de Geer (2008). Assumption 
D has a similar flavor to the assumption (A2) in Biihlmann (2006) for the persis- 
tency property of boosting method in high-dimensional linear regression models. 
Here it replaces the Lipschitz assumption in van de Geer (2008). Assumption 
E is commonly used for survival models with censored data, see for example, 
Andersen and Gfll (1982). 

Assumption A. maxi<fe<™{||-!/;fe||ocVcrfc} < oo. 

Assumption B. There exists an 77 > and strictly convex increasing G, 
such that for all 6* e 9 with \\fg - /||oo < ?7, one has £{.fe) > G{\\fe - f\\). 
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Assumption C. There exists a function D{-) on the subsets of the index set 
{1, • • • , m}, such that for all C {1, • • • , '^^}, and for d\\ 9 & <d and G 9, we 
have Y.k^^ <yk\ek - 4 1 < ^JWnWfe - /ell- 

Assumption D. L,„ := supggQ I^fcLi l^fcl < 

Assumption E. The observation time stops at a finite time t > with 
TT := P{Y > r) > 0. 

The convex conjugate of function G given in Assumption B is denoted by H 
such that uv < G{u) + H{v). A typical choice of G is quadratic function with 
some constant Cq, i.e. G{u) = m^/Cq, see van de Geer (2008). 

From Assumptions A, D and E, we have for any 9 € 

el/«(^.)l < eif™i™-(™) c/^ < oo (2.1) 

for all i, where o-(„j) = maxi<fc<m au- 

Let I{9) := X^fcLi o'fcl^'fel be the theoretical norm of 9, and I{9) := X^feLi o'fel^fcl 
be the empirical li norm. For any 9 and 9 in O, denote 

h{9\9):^ <Jk\9k\, l2{9\9) := I{9) - h{9\9). 

Similarly we have corresponding empirical versions, 

hi9\9):= ^k\9k\, l2{9\9) := i{9) - h{9\9). 



3. Main results 



3.1. Non-random normalization weights in the penalty 



We show that a similar result to Theorem A. 4 of van de Geer (2008) holds 
for the Cox model. Suppose that cTk = [Ei'lY^^ are known and consider the 
estimator 

9„ argmin{/„(6l) + A„/(6l)}. 

see 

Denote the empirical probability measure based on the sample {(A^i, Yi, Ai) : 
i = l,...n} by P„. Let ei,--- ,£„ be a Rademacher sequence, independent 
of the training data (Ai, Yi, Ai), • • • ,(A„,y„,A„). We fix some 9* & Q and 
denote ■= {fe : 9 G 0,1(9 - 9*) < M} for some M > 0. For any 9 where 
1(9-9*) < M, denote 

Zg{M) := |(P„ - P) [7/, - 7/,.]| = I [U9) - l{9)\ - [U9*) - 1(9*) 

Note that van de Geer (2008) has considered the supremum of the above 
Ze{M) over Q. We find that the pointwise argument is adequate for our purpose 
because only the lasso estimator is of interest, and that the calculation with 
supjg^^j Zg{M) in van de Geer (2008) does not apply to the Cox model due to 
the lack of Lipschitz property. 
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Lemma 3.1. Under Assumptions A, D and E, for all 9 satisfying I{9—6*) < M , 
we have 

EZg{M) < a„M, 

where 



2A;2Jog(2m) , X„,log(2m) 



Proof. By the symnietrization theorem, see e.g. van der Vaart and Wehner (1996) 
or Theorem A. 2 in van de Geer (2008), for a class of only one function we have 



EZe{M) < 2E\ 



< 2E\ 



1 " 

-^£,{[/e(X,)-log/i(i^,;/e)]A, 
- [fe>{X,) - log |l{Y,■fe^)]l^^} 

n \ 
-5^£,{/fl(X0-/fl.(X0}A. 

) 

n 

- V e.{log/^m; fe) - \og^i{Y,■ /^OjA, 

n ^ — ^ 



2E\ 



A + B. 



For A we have 



, fc=i 



> 




1 


max 






I l<fc<m 


n 









1 " 

eiAiipk{Xi)/ak 

Tt ^ ^ 



Applying Lemma A.l in van de Geer (2008) with rjn = K,n and r,^ = we 
obtain 

Tt ^ ^ 



E max 

\ l<fc<m 



i=i 



< an- 



Thus we have 



A < 2a„M. 



(3.1) 



For B, instead of using the contraction theorem that requires Lipschitz, we 
use the mean value theorem in the following: 
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n 

- V e,{log ^l{Y,■fe) - log ^l{Y,■ fg,)}/^, 

n ^ — ^ 



i=i fe=i 



Y,JX 



< 



k=l 



max 

l<fc<m 



1 " 

T7 ^ 



< M max 

l<fe<m 



1 " 

-V e,A,Fe..(A:,r,) 



where is between 9 and and 



< 



{\\Mo./<Jk)E [l(y>t)e/«"W] 



(3.2) 



2 



Since for all i, 

E[e^A,Fg.,{k,Y,)]=0, \\e^A,Fg.,{k,Y,)\\^ < i^,„, and 

n n 

71 — ' n ^ — ' 

i=i i=i 

following Lemma A.l in van dc Gecr (2008), we obtain 

B < 2a„M. (3.3) 

Combining (3.1) and (3.3), the upper bound for EZq^M) is achieved. □ 

We now can bound Zo{M) using the Bousquet's concentration theorem pro- 
vided in van de Geer (2008) as Theorem A.l. 

Corollary 3.1. Under Assumptions A, D and E, for all i\/ > 0, ri > and all 
9 satisfying I{6 — 9*) < M , it holds that 



P {Ze{M) > Xi^^M) < exp (-na^r?) , 



wher 



ArfonKn 



S. Kong and B. Nan/Lasso Cox Regression 7 

Proof. Using the triangular inequality and the mean value theorem, we obtain 
\lfe-lfeA < \fe{X)-fe-.{X)\A + \\ogfi{Y;fe)-log^,{Y;fe*)\A 



< 



E 

A;=l 



<yk\tik 



+ \log^,iY■Je)-log^liY■Je,)\ 



< MK,n + y^<Jk\ek-ei\- max \Fg., {k, Y) \ 

^ — ^ l<k<ni 
fc=l " - 

< 2MX„, 

where 9** is between 9 and 9*, Fg*,{k,Y) is defined in (3.2). So we have 

||7/«-7/«.||oo<2Mi^„, 

and 

Therefore, in view of Bousquet's concentration theorem and Lemma 3.1, for all 

M > and n > 0, 



P Zg{M) > a^M 1 + 2ri ^2 {K^ + a^Kra) + 



4rfa„A' 



< exp (—nal^rf) . 

Now for any 9 satisfying I{9 — 9*) < M, wc bound 

Re{M) := | [U9) ~ f„(0)] - [ln{9*) - ln{9*) 
which is equal to 



□ 



1 " 



~ E \v f\ log - E 



< sup 

0<t<r 



log - > . s log - 2^ 



i=i 



i=i 



By the mean value theorem, we have 
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Re{M) < sup 

0<t<T 



fe=i 



sup 

0<t<T 



fc=l 



i;[l(y >t){V;fc(X)/afc}e/''"W] 



< M sup 

0<t<T 



sup < max 

0<t<T I i<fc<m 

- E 



1 

1 " 

n ^-^ 

1=1 

l{Y>t){MX)/ak}ef^"^''^ 

n 

J2 HY^ > t)efo"^^'^ ~ E \l{Y > t)efo"^^^ 



(3.4) 



where 6** is between and 0* , and by (2.1) we have 



sup 

0<t<T 



n ^ — ^ 



1 " 

77 Z 



>t) 



j=i 



(3.5) 



Lemma 3.2. Under Assumption E, we have 



P(^if:i(r.>r)<|)<2e-V.. 



Proof. This is obtained directly from Massart (1990) by taking r = 7r-\/n/2 in 
the following: 



> T-) < 77 < P sup 
2 / \o<t<T 



1 " \ 

1=1 / 



< 2e 



□ 
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Lemma 3.3. Under Assumptions A, D and E, for all 9 we have 



( 1 " 

sup -V >t)e-^' 



> Uma„ri 



(3.6) 



< -We 



where W is a constant that only depends on K ^ \pl. 

Proof. For a class of functions indexed hy t, ,^ ~ {l{y > t)e^''^^'> /U„i : t S 
[0,T],y G R, 6^"^'^^ < Um}, we calculate its bracketing number. For any e > 0, 
let ti be the i-th [1/e] quantile of Y, i.e., 

P{Y<t,)^ie, i = Jl/e] -1, 

where \x~\ is the smallest integer that is greater than or equal to x. Furthermore, 
denote to = and = +oo. For i = 1, • • • , [1/e], define brackets [Li,Ui] 

with 

L,{x,y) - l(y > t,)eM-)/u^, U,{x,y) - l(y > t,_i)e-^^("Vf^™ 
such that Li{x,y) < l{y > t)ef<'^'^^ /Um < Ui{x,y) when ti-i <t < tj. Since 

. 1/2 



aSe(X) 



Urr 



-{l{Y>t^)-l{Y >t^-i)] 



< {P{t,-i <Y < tO}'^' = Vi, 
we have Nn{^/e,^,L2) < 2/e, which yields 



iV[](£,=^,i2)<^ 



where K = ^/2. Thus, from Theorem 2.14.9 in van der Vaart and Wellner (1996), 
we have for any r > 0, 



P\ \fn sup 

\ 0<i<r 



1 i)e^(^0 y,(t-!e) 



u„ 



>r\ < -W^r^e-^'' 
5 



where is a constant that only depends on K. Note that r^e is bounded 
by e~^. Let r — -y/na„ri, we obtain (3.6). 

□ 
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10 



P sup max 



0-fc 



E 



anTl 



log(2m) 



1 



- 10 

Proof. Consider the classes of functions indexed by 

= {l[y>t)eM''H'k[x)/{cTkK^Ura):te[Q,TUeIi, 
\ef'^''Hk(x)/cjk\ <if,nC/™}, fc = l,...,m. 

Using the same argument in the proof of Lemma 3.3, we have 



(3.7) 



where K — \/2, and then for any r > 0, 



P \/n sup 

\ 0<t<r 



1 " > t)e^(^.)^fc(X,) 

^ m Ufa 



>r\< ^M^'e-'- 



Thus we have 



P \fn sup max 

0<t<TO<*:<™ 



E 



1 

0<t<r 



< mP[ \fn sup 

0<t<T 



1 " 



4=1 



> 7' 
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Let log(2m) — = —na^rl, i.e. r = ^Jna^rl + log(2TO). Since 



we obtain (3.7). 



□ 



Corollary 3.2. Under Assumptions A, D and E, for all M > and all 9 that 
satisfies I (6 — 9*) < M , we have 

P {Re{M) > ll^M) < 2 cxp (-n^V2) + ^W^ exp {-nalrj) , (3.8) 



where 



2a„ri 



log(2m) 



Proof. From inequalities (3.4) and (3.5) wc have 

P {Re{M) < 'X^o ■ M) > P n n El) , 
where the events Ei, E2 and E3 are defined in the following: 



E, = 



Eo = 



E. = 



n ~j 

-^i(r, >i) 
11 ^ — ^ 



sup 

0<t<T 



max sup 

0<fc<mo<t<r 



> Unianri 



-yi(r,>t)^^^e/''"(^-) 



E 



L(r >t)^^^e/«"m 



Thus 



P {Re{M) > X% ■M)<P {ED + P {ED + P {E^ , 
and the result follows from Lemmas 3.2, 3.3 and 3.4. 



□ 



We now show oracle bounds for the lasso estimator 9n under Assumptions 
A-E following van de Geer (2008), but using pointwise arguments. Let 



Take 6 > 0. d > 1, and 



An,0 — A„_Q + A„_Q 

b + d 
(d- 1)6 



(3.9) 



V 1 
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Let Dg := D{{k : 6^ 0, k ^ 1, . . . , to}) be the number of nonzero 9k^s, where 
D{-) is given in Assumption C. Define 

(Al) A„ := (f + 6)A„,o, 

(A2) Vg := 26H ^ ^^"^^ ^ , where < <5 < 1, 

(^3) := argming^QiSifg) + Vg}, 
(AA) e;:=(f + <5)£:(/e.) + Vfl;, 



(^5) C - T , 

-^11,0 

iA6) eiel) := argmin,gej(e-e.)<d.c*/^{^^(/«) " 2A„/i(0 - ^^l^:)}. 

We also impose the following conditions: 

Condition I{b,S). - /||oo < ??• 

Condition ll{b,S,d). Wfg^^'j - /lloo < V- 

In both conditions, rj is given in Assumption B. 

Lemma 3.5. Suppose Conditions I(b,S) and II(b,S,d) are met. For all 9 ^ Q 
with 1(9 — 0* ) < dhCn/b, it holds that 

2X,J,{9 - 9:) < 5E{fg) + el - S{fg.J. 

Proof. The proof is exactly the same as that of Lemma A. 4 in van do Geer 
(2008), with A„ defined in (3.9). □ 

Lemma 3.6. Suppose Conditions I{b,d) and II{b,d,d) are met. Consider any 
random 9eQ with ln{9) + A„/(^) < Z„(6';) + \nl{9l). Let 1< do < db- It holds 
that 



P ( I{9 ~ 9:) < do^) < P (l{9 91) <fdo + b\C, 



+ (^1 + j^W^^ exp {-nalrl) + 2 exp {~mr^ /2) 



Proof. The idea is similar to the proof of Lemma A. 5 in van dc Geer (2008). 
Let £ = £{fg) and £* f (/e-J. We wiU use short notation: Ii{9) = h{9\9*J 
and l2{9) = I2{9\91). Since lr,{9) + \nl{9) < ln{9l) + A„/(6i;), on the set where 
I{9 - 91) < doQ/b and Zg{doQ/b) < doQ/b ■ A;^_o, we have 

Rg{doCn/b) > [ln{9:) + A„/(0:)] - [ln{9) + A„/(^)] - \,J{9:J + A„/(0) 
-[U9:)-U9)] 

> -Kwi) + Km - M9i) - m 

> -XnWi) + Kii9) - im) - m - doc/b ■ xio 

> -XnI{9:) + XnIi9)-£* +£-doXioC/b. (3.10) 
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By (3.8) wc know that R,g{doCn/b) is bounded by doXnoCn/^ with probability 
at least 1 — i^W^^ exp (^—naf^rfj — 2exp (— ri7r^/2), then we have 

£ + XrJie) < 'XiodoCJb + £* + Xnl{0:) + XiodoC/b. 

Since I{0) ~ li{0) + hiO) and = li{0*n), using the triangular inequality, 

we obtain 

£ + {l + b)Xnfll2{e) 

< Xr,fidoCn/b + £* + (1 + b)X„flh{e:^) - (1 + 6)A„,o/l(^) 

< Xn^odoC/b + £* + {l + b)X,M0-e*„). (3.11) 

The remaining of the proof follows exactly the same as the corresponding part 
of the proof of Lemma A. 5 in van de Geer (2008). □ 

Corollary 3.3. Suppose Conditions I{b,d) and II{b,d,d) are met. Consider any 
random eO with /„(6') + A„/(^) < lni0^) + Xnli9^)- Let 1< do < d^. It holds 
that 

p(^i{0-0:)<do^ 

< P (l{0 01) < [1 + (rfo - 1)(1 + b)-^] f 



N 



1 + Y^W^^ exp {-ndlrl) + 2exp(-n7rV2)| 



Proof. Repeat Lemma 3.6 N times. □ 
Lemma 3.7. Suppose Conditions I(b,S) and II{b,S,d) are met. Define 

Os = + (1 - s)0*„, 

where 



dC*+bI{0,,~0*J 
Then for any integer N , with probability at least 



we 



have 



1 - iV <i ( 1 + —W ] exp {-ndiri) + 2exp {-mr^ /2] 



i{0s-ei)<{i + {d-i){i + b)-^)^^ 



b 

Proof. Since the negative log partial likelihood ?,i(0) and the lasso penalty are 
both convex with respect to 0, applying Corollary 3.3, we obtain the above 
inequality. □ 
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Lemma 3.8. Suppose Conditions I{h,5) and II{b,S,d) are met. Let Ni £ N := 
{1,2,...} andN2 G NU{0}. Define di = {l + b)-^^ and 82 = {l + b)-^\ For 
any n, with probability at least 

1 - (TVi + N2) I (^1 + ^W^^ exp i-nalrl) + 2 exp(-n7rV2)| , 

we have 

i{0u-o:)<d{SuS2)^, 



where 

MA , , l + jd'- l)Si , 
d{di,S2) = 1 + — J-S2. 

{d-l){l - Si) 

Proof. The proof is exactly the same as that of Lemma A. 7 in van de Geer 
(2008), with a slightly different probability bomid. □ 

We now provide the major theorem of the oracle inequalities for the Cox 
model lasso estimator. 

Theorem 3.1. Suppose Assumptions A-E and Conditions I(b,S) and II(b,S,d) 
are met. Let ^ 

A{b,6,5i,62) ■.^d{Si,S2)^-j^yi. 



We have with probability at least 

j {l + b)^Aib,S,Si,S2) \ \ f, , 3,,,2\ f -2 2^ 

+ 2 exp (-7i7rV2) 



that 



and moreover, 



I{0n-0:)<d{Si,S2)^. 

Proof. The proof follows the same ideas in the proof of Theorem A. 4 in van de Geer 
(2008), with exceptions of pointwise arguments and slightly different probability 
bounds. Since this is the major result, to be self-contained, we provide a detailed 
proof here despite the amount of overlaps. 

Similar to van de Geer (2008), we define £ := £{fg ) and £* := £{fg^); use 
the notation Ii{e) := /i(6l|6i;) and 12(0) := hiOK); set 

Sb 



■ 1 - (52 ' 

and consider the cases (a) c < d{Si,S2) and (b) c > d{Si,62). 
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(a) Consider c < d{5i,52)- Let J be an integer satisfying (1 + 6)"'^^c < 
d{5i,52) and (1 + bY c > d{5i,52)- We consider the cases (al) cCn/b < I{9n — 
ei) < d{5u52)CJh and (a2) /(4 - 9*^) < cQ/b. 

(al) If cCJb < 1(91 - 91) < d{6i,d2)Cn/b, then 

(1 + by-^c^ < 1(0,, - 91) < (1 + byc^ 

for some j G {1, • • • , J}. Let 

do = c(l + 5)^-1 <d{5u52)<db. 

From Corollary 3.1, with probability at least 1 — exp {—na^r\) we have Zg ((1 + 

^')doC/&) < (1 + b)d^\%CJb. Since /„(^„) + A„/(^„) < Z„(0;j + A„/(0,1), from 
equation (3.10), we have 

£ + A„/(4) < % ((1 + ) + ^* + ^"^(^") + (1 + b)Kodoj-- 

By (3.8), i?e„ ((1 + b)doC*b) is bounded by (1 + b)X^odoCn/b with probability at 
least 

3 

1 - —W'^ exp {-nalrl) - 2 exp (-ri7rV2) , 



then we have 



f + (l + 6)A„,o/(^„) < (l + 6)Afodo^+f* + (l + 6)A„.o/(^?:j 



A* 

A 1 Sn 



+ (l + fe)AVo 5 

< (1 + 5)A„,o/(4 - e,:) + f * + (1 + &)A„,o/(0:). 

Since 1(9,,) = h{9,-,) + h^,), - 9^) = h{9,, - 9*J + hiL). and I{91) = 
/i(0*), by triangular inequality we obtain 

£ <2{l + b)\n,oIi{9n-9l)+£*. 

From Lemma 3.5, 

8 < Si + el -£*+£* ^6£ + el. 

Hence, 

^ - 1-5 

(a2) If /(0„ — 0*) < c(*/b, from equation (3.11) with do — c, with probability 
at least 

1 - I (l + ^ W^') exp {-nalrj) + 2 exp(-n7rV2) 
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we have 



£ + {! + 6)A„,o/(0„) < Y3^A„,oC +£* + {! + b)Xn,olK)- 

By triangular inequality, Lemma 3.5 and (A4), 
S 



£ < 



< 



1-6^ 



A„,oC +£* + {l + b)Xn,ah{0n - O 



6 1 



< 



l-(52 2 
6 1 



Hence, 



£ < 



l-(52 2 
(5 



1 * 

-el + -£. 



2(1 + (5) " 2 



2-5 



1-^2 2 2(1 + 6) 



1-5 



Furthermore, by Lemma 3.8, we have with probability at least 

l-{Ni+ N2) I (^1 + ^W^^ exp (-nalrj) + 2exp (-n7rV2)| 



that 



where 



A^i = logi+f, ( — ) , 7V2 = logi+f, 



(b) Consider c > ^2). On the set where I{dn - On) < ^2)Ci/^, from 
equation (3.11) we have with probability at least 



1-1 



+ ^W^^ exp {-nalrl) + 2 exp {-m:'^ /2) 



that 



£ + (l + 6)A„.o/(^n) < A„,od('5i,'52)%+f* + (l + ?^)A„,o/(0:) 





< 



voC+^* + (i + &)w(^:), 



l-(52 

which is the same as (a2) and leads to the same result. 
To summarize, let 

A={£< , B = |/(4 - 61) < d{S,,d2)^ 
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Note that 

Under case (a), we have 

p{Ar\B) 

= F(al) - PiA" n al) + P(a2) - P(A= n a2) 

> P(al) - J I (^1 + I^W^^^ cxp (-na^r?) + 2 cxp(-n7rV2) 

+ P(a2) - I (^1 + ^^^) cxpl-na^r?) + 2cxp (-n7rV2) 
= P{B) _ ( J + 1) I (^1 + cxp {-nalrl) + 2 cxp (-n7rV2) | 

> 1 - (TVi + iV2 + J + 1) I (^1 + ^W^^ exp {-nalrl) 

+ 2cxp(-n7r2/2) | 

(1 + 6)2 d{5i,52){l-5^) 



> 1 - logi+ 

3 



6162 Sb 



1 + —W^ ] exp (-na;^,rf) + 2 cxp(-n7r72) 



Under case (b), 



P(An P) 

^ P{B)-P{A'r\B) 



> P{B) - I (^1 + 1 W^2^ cxp i-nalrl) + 2 cxp {~n7r^/2) | 

> l-(7Vi+iV2 + 2)i (l + ^wAe,,p i-nalrl) 



2 exp (-?i7r2/2) 
(1 + 6)2 



1 - logl+6 
1 



S1S2 



+ ^W^^^ cxp (-nalrl) + 2 cxp(-n7r2/2) 



We thus obtain the desired resuh. □ 
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3.2. Random normalization weights in the penalty 

The case with random weights can be argued in the exactly the same way as 
that in van de Geer (2008), for which the same tail probability given in Lemma 
A. 9 of van de Geer (2008) is added to the probability bound in Theorem 3.1 
under the same set of conditions for Theorem A. 5 in van de Geer (2008). Thus 
details are omitted. 
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